Learning Multi-modal Similarity
نویسندگان
چکیده
In many applications involving multi-media data, the definition of similarity between items is integral to several key tasks, e.g., nearest-neighbor retrieval, classification, and recommendation. Data in such regimes typically exhibits multiple modalities, such as acoustic and visual content of video. Integrating such heterogeneous data to form a holistic similarity space is therefore a key challenge to be overcome in many real-world applications. We present a novel multiple kernel learning technique for integrating heterogeneous data into a single, unified similarity space. Our algorithm learns an optimal ensemble of kernel transformations which conform to measurements of human perceptual similarity, as expressed by relative comparisons. To cope with the ubiquitous problems of subjectivity and inconsistency in multimedia similarity, we develop graph-based techniques to filter similarity measurements, resulting in a simplified and robust training procedure.
منابع مشابه
Grounding Semantics in Olfactory Perception
Multi-modal semantics has relied on feature norms or raw image data for perceptual input. In this paper we examine grounding semantic representations in olfactory (smell) data, through the construction of a novel bag of chemical compounds model. We use standard evaluations for multi-modal semantics, including measuring conceptual similarity and cross-modal zero-shot learning. To our knowledge, ...
متن کاملDeep Similarity Learning for Multimodal Medical Images
An effective similarity measure for multi-modal images is crucial for medical image fusion in many clinical applications. The underlining correlation across modalities is usually too complex to be modelled by intensity-based statistical metrics. Therefore, approaches of learning a similarity metric are proposed in recent years. In this work, we propose a novel deep similarity learning method th...
متن کاملLearning to Hash on Partial Multi-Modal Data
Hashing approach becomes popular for fast similarity search in many large scale applications. Real world data are usually with multiple modalities or having different representations from multiple sources. Various hashing methods have been proposed to generate compact binary codes from multi-modal data. However, most existing multimodal hashing techniques assume that each data example appears i...
متن کاملMulti- and Cross-Modal Semantics Beyond Vision: Grounding in Auditory Perception
Multi-modal semantics has relied on feature norms or raw image data for perceptual input. In this paper we examine grounding semantic representations in raw auditory data, using standard evaluations for multi-modal semantics, including measuring conceptual similarity and relatedness. We also evaluate cross-modal mappings, through a zero-shot learning task mapping between linguistic and auditory...
متن کاملLearning the Similarity Measure for Multi-Modal 3D Image Registration
Multi-modal image registration is a challenging problem in medical imaging. The goal is to align anatomically identical structures, however, their appearance in images acquired with different imaging devices, such as for example CT or MR, may be very different. Registration algorithms generally try to deform one image, the floating image, such that it matches with a second, the reference image,...
متن کاملSupervised Coupled Dictionary Learning with Group Structures for Multi-modal Retrieval
A better similarity mapping function across heterogeneous high-dimensional features is very desirable for many applications involving multi-modal data. In this paper, we introduce coupled dictionary learning (DL) into supervised sparse coding for multi-modal (crossmedia) retrieval. We call this Supervised coupleddictionary learning with group structures for MultiModal retrieval (SliM). SliM for...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of Machine Learning Research
دوره 12 شماره
صفحات -
تاریخ انتشار 2011